MROrchestrator: A Fine-Grained Resource Orchestration Framework for Hadoop MapReduce

نویسندگان

  • Bikash Sharma
  • Ramya Prabhakar
  • Seung-Hwan Lim
  • Mahmut T. Kandemir
  • Chita R. Das
چکیده

Efficient resource management in data centers and clouds running large distributed data processing frameworks like Hadoop is crucial for enhancing the performance of hosted MapReduce applications, and boosting the resource utilization. However, existing resource scheduling schemes in Hadoop allocate resources at the granularity of fixed-size, static portions of the nodes, called slots. A slot represents a multi-dimensional resource slice, consisting of CPU, memory and disk on a machine. In this work, we show that MapReduce jobs have widely varying demands for multiple resources, making the static and fixedsize slot-level resource allocation a poor choice both from the performance and resource utilization standpoints. Furthermore, lack of coordination in the management of multiple resources across the nodes, prevents dynamic slot reconfiguration and leads to resource contention. Towards this end, we perform a detailed experimental analysis of the performance implications of slotbased resource scheduling in Hadoop. Based on the insights, we propose the design and implementation of MROrchestrator, a MapReduce resource Orchestrator framework, that can dynamically identify the resource bottlenecks, and resolve them through fine-grained, coordinated, and on-demand resource allocations. We have implemented MROrchestrator on two 24-node native and virtualized Hadoop clusters. Experimental results with suite of representative MapReduce benchmarks demonstrate up to 38% improvement in reducing job completion times, and up to 25% increase in resource utilization. We further show how popular resource managers like NGM and Mesos when augmented with MROrchestrator can boost their performance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments

Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop d...

متن کامل

FMEM: A Fine-grained Memory Estimator for MapReduce Jobs

MapReduce is designed as a simple and scalable framework for big data processing. Due to the lack of resource usage models, its implementation Hadoop hands over resource planning and optimizing works to users. But users also find difficulty in specifying right resource-related, especially memory-related, configurations without good knowledge of job’s memory usage. Modeling memory usage is chall...

متن کامل

Hadoop Memory Usage Model

Hadoop MapReduce is a powerful open-source framework towards big data processing. For ordinary users, it is not hard to write MapReduce programs but hard to specify memory-related configurations. To help users analyze, predict and optimize job’s memory consumption, this technical report presents a fine-grained memory usage model. The proposed model reveals the relationship among memory usage, d...

متن کامل

Large-scale incremental processing with MapReduce

An important property of today’s big data processing is that the same computation is often repeated on datasets evolving over time, such as web and social network data. While repeating full computation of the entire datasets is feasible with distributed computing frameworks such as Hadoop, it is obviously inefficient and wastes resources. In this paper, we present HadUP (Hadoop with Update Proc...

متن کامل

Benchmarking and Performance studies of MapReduce / Hadoop Framework on Blue Waters Supercomputer

MapReduce is an emerging and widely used programming model for large-scale data parallel applications that require to process large amount of raw data. There are several implementations of MapReduce framework, among which Apache Hadoop is the most commonly used and open source implementaion. These frameworks are rarely deployed on supercomputers as massive as Blue Waters. We want to evaluate ho...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012